Prediction Using Note Text: Synthetic Feature Creation with word2vec
نویسندگان
چکیده
word2vec affords a simple yet powerful approach of extracting quantitative variables from unstructured textual data. Over half of healthcare data is unstructured (1) and therefore hard to model without involved expertise in data engineering and natural language processing. word2vec can serve as a bridge to quickly gather intelligence from such data sources. In this study, we ran 650 megabytes of unstructured, medical chart notes from the Providence Health & Services electronic medical record through word2vec. We used two different approaches in creating predictive variables and tested them on the risk of readmission for patients with COPD (Chronic Obstructive Lung Disease). As a comparative benchmark, we ran the same test using the LACE risk model (2) (a single score based on length of stay, acuity, comorbid conditions, and emergency department visits). Using only free text and mathematical might, we found word2vec comparable to LACE in predicting the risk of readmission of COPD patients.
منابع مشابه
A Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy
Text classification, the task of metadata to documents, requires significant time and effort when performed by humans. Moreover, with online-generated content explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Currently, lots of state-or-art text mining methods have been applied to classification process, many of them based on the key wor...
متن کاملSentence Based Discourse Classification for Hindi Story Text-to-Speech (TTS) System
In this work, we have proposed an automatic discourse prediction model. It predicts the discourse information for a sentence. In this study, three discourse modes considered are descriptive, narrative and dialogue. The proposed model is developed using story corpus. The story corpus comprises of audio and its corresponding text transcription of short children stories. The development of this mo...
متن کاملImproving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm
Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...
متن کاملModeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval
Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to map different modalities into a common semantic space, in which distance between concepts in different modalities can be well modeled. For crossmodal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutio...
متن کاملLearning Stylometric Representations for Authorship Analysis
Authorship analysis (AA) is the study of unveiling the hidden properties of authors from a body of exponentially exploding textual data. It extracts an author’s identity and sociolinguistic characteristics based on the reflected writing styles in the text. It is an essential process for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. However, mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1503.05123 شماره
صفحات -
تاریخ انتشار 2015